{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GetFixedPointMultiplierShift\n",
    "\n",
    "源码：`tvm/src/relay/qnn/utils.cc`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```c++\n",
    "/*\n",
    " * \\brief Convert FP32 representation into fixed point representation.\n",
    " * \\param double_multplier The input FP32 number.\n",
    " * \\return The pair of multiplier and shift for fixed point representation.\n",
    " * \\note Converts a floating point number so that it can be represented by\n",
    " *       integers. The representation is\n",
    " *             float_number = (significand) * 2^(exponent)\n",
    " *\n",
    " *       The significand is a number between 0.5 and 1. This is represented by\n",
    " *       an integer number. For example, if it is int32, then the decimal point\n",
    " *       exists between bit 31 and 30 from LSB (or between first and second bit\n",
    " *       from the left).\n",
    " *\n",
    " *       Some examples are\n",
    " *           0.25 = (0.5) * 2^(-1)\n",
    " *           0.125 = (0.5) * 2^(-2)\n",
    " *\n",
    " *       Credit to TFLite reference implementation.\n",
    " */\n",
    "std::pair<int32_t, int32_t> GetFixedPointMultiplierShift(double double_multiplier) {\n",
    "  int32_t significand, exponent;\n",
    "  if (double_multiplier == 0.) {\n",
    "    significand = 0;\n",
    "    exponent = 0;\n",
    "    return std::make_pair(significand, exponent);\n",
    "  }\n",
    "\n",
    "  // Get the significand and exponent.\n",
    "  double significand_d = std::frexp(double_multiplier, &exponent);\n",
    "\n",
    "  // Convert the double significand to int significand, i.e., convert into a\n",
    "  // integer where the decimal point is between bit 31 and 30. This is done by\n",
    "  // multiplying the double value with 2^31 and then casting to int.\n",
    "  significand_d = std::round(significand_d * (1ll << 31));\n",
    "  auto significand_int64 = static_cast<int64_t>(significand_d);\n",
    "  ICHECK_LE(significand_int64, (1ll << 31));\n",
    "  if (significand_int64 == (1ll << 31)) {\n",
    "    significand_int64 /= 2;\n",
    "    ++exponent;\n",
    "  }\n",
    "  ICHECK_LE(significand_int64, std::numeric_limits<int32_t>::max());\n",
    "  significand = static_cast<int32_t>(significand_int64);\n",
    "  return std::make_pair(significand, exponent);\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "函数 `GetFixedPointMultiplierShift`，它接受双精度浮点数 `double_multiplier` 作为参数，返回包含两个整数的 pair 对象。\n",
    "\n",
    "该函数的作用是将输入的双精度浮点数转换为定点数表示形式，即将其转换为具有固定小数位数的数值。具体来说，它将输入的浮点数分解为尾数和指数两部分，并将尾数转换为整数，使得小数点位于第 31 位和第 30 位之间。然后，将这个整数和指数一起返回。\n",
    "\n",
    "在函数内部，首先判断输入的浮点数是否为零，如果是，则直接返回零值对。否则，使用 `std::frexp` 函数获取浮点数的尾数和指数。接着，将尾数乘以 $2^{31}$，并四舍五入得到整数。如果这个整数等于 $2^{31}$，则将其除以2，并将指数加1。最后，将整数转换为 `int32_t` 类型，并检查其是否小于等于 `int32_t` 的最大值。如果满足条件，则将其和指数一起返回。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`std::frexp` 是 C++ 标准库中的一个函数，用于将一个浮点数分解为尾数和指数。它的原型如下：\n",
    "\n",
    "```cpp\n",
    "double frexp(double x, int* exp);\n",
    "```\n",
    "\n",
    "参数：\n",
    "- `x`：要分解的浮点数。\n",
    "- `exp`：指向一个整数的指针，用于存储分解后的指数(xponent)部分。\n",
    "\n",
    "返回值：\n",
    "- 返回分解后的尾数(Mantissa)部分。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#include <limits>\n",
    "#include <string>\n",
    "#include <utility>\n",
    "#include <vector>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "std::frexp(x, &n) => 16.4 = 0.5125 * (1 << 5)"
     ]
    }
   ],
   "source": [
    "#include <math.h>\n",
    "#include <iostream>\n",
    "\n",
    "double x, y;\n",
    "int n;\n",
    "x = 16.4;\n",
    "y = frexp(x, &n);\n",
    "std::cout << \"std::frexp(x, &n) => \" \n",
    "         << x << \" = \" << y << \" * \"\n",
    "         << \"(1 << \" << n << \")\";"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "16.400000"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "0.5125 * (1 << 5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "C++14",
   "language": "C++14",
   "name": "xcpp14"
  },
  "language_info": {
   "codemirror_mode": "text/x-c++src",
   "file_extension": ".cpp",
   "mimetype": "text/x-c++src",
   "name": "c++",
   "version": "14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
